{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bafef919",
   "metadata": {},
   "source": [
    "(genai-02-mm-llm)=\n",
    "# Model monitoring using LLM\n",
    "\n",
    "This tutorial illustrates a model monitoring system that leverages LLMs to maintain high standards for deployed models.\n",
    "\n",
    "**In this tutorial**\n",
    "- [Prerequisites](#prerequisites)\n",
    "- [Add the monitoring-function code](#add-the-monitoring-function-code)\n",
    "- [Deploy the model, enable tracking, and deploy the function](#deploy-the-model-enable-tracking-and-deploy-the-function)\n",
    "\n",
    "This tutorial explains how an LLM can be monitored. To see it in action, run the [Large Language Model Monitoring](https://github.com/mlrun/demo-monitoring-and-feedback-loop/blob/main/README.md) demo.\n",
    "\n",
    "## Prerequisites\n",
    "- GPU node with NVIDIA drivers is necessary for the serving function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "33d4d744-4452-465f-96d0-36aa71ceb463",
   "metadata": {},
   "outputs": [],
   "source": [
    "import mlrun\n",
    "from mlrun.features import Feature"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01c54e77",
   "metadata": {},
   "source": [
    "Create the project"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4825fbe2-89d8-4687-978e-0f4d76684353",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> 2025-09-12 04:27:22,190 [info] Project loaded successfully: {\"project_name\":\"genai-tutorial-xingsheng\"}\n"
     ]
    }
   ],
   "source": [
    "project = mlrun.get_or_create_project(\n",
    "    \"genai-tutorial\", user_project=True, allow_cross_project=True\n",
    ")\n",
    "project.set_source(\".\", pull_at_runtime=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39914102",
   "metadata": {},
   "source": [
    "Set the credentials"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90698573-093a-4050-870a-c2341b4862de",
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.model_monitoring_utils import enable_model_monitoring\n",
    "\n",
    "# If this project was running with MM enabled pre-1.8.0, disable the old model monitoring to update configurations\n",
    "project.disable_model_monitoring(delete_stream_function=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f2c7533",
   "metadata": {},
   "source": [
    "Enable model monitoring for the project"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16cc6980-47d9-487a-8a98-cb9e3282c23c",
   "metadata": {},
   "outputs": [],
   "source": [
    "enable_model_monitoring(project=project, base_period=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de62f740",
   "metadata": {},
   "source": [
    "## Add the monitoring-function code\n",
    "\n",
    "The monitoring function code collects the traffic to the serving function, analyzes it, and generates results for the specified metric.\n",
    "\n",
    "```{admonition} Note\n",
    "You can also import model monitoring applications from the [MLRun hub](https://www.mlrun.org/hub/). Each application has complete usage instructions.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96df62bc-555c-4700-b118-65b0ccacb913",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile monit-code.py\n",
    "import re\n",
    "from typing import Any, Union\n",
    "\n",
    "import mlrun\n",
    "import mlrun.common.schemas\n",
    "from mlrun.model_monitoring.applications import (\n",
    "    ModelMonitoringApplicationBase,\n",
    "    ModelMonitoringApplicationResult,\n",
    ")\n",
    "\n",
    "STATUS_RESULT_MAPPING = {\n",
    "    0: mlrun.common.schemas.model_monitoring.constants.ResultStatusApp.detected,\n",
    "    1: mlrun.common.schemas.model_monitoring.constants.ResultStatusApp.no_detection,\n",
    "}\n",
    "\n",
    "\n",
    "class LLMMonitoringFunction(ModelMonitoringApplicationBase):\n",
    "\n",
    "    def do_tracking(\n",
    "        self,\n",
    "        monitoring_context,\n",
    "    ) -> Union[\n",
    "        ModelMonitoringApplicationResult, list[ModelMonitoringApplicationResult]\n",
    "    ]:\n",
    "        \n",
    "        # User monitoring sampling, in this case an integer representing model performance\n",
    "        # Can be calulated based off the traffic to the function using monitoring_context.sample_df\n",
    "        result = 0.9\n",
    "\n",
    "        monitoring_context.log_dataset(\n",
    "            key=\"llm-monitoring-df\",\n",
    "            df=monitoring_context.sample_df\n",
    "        )\n",
    "\n",
    "        # get status:\n",
    "        status = STATUS_RESULT_MAPPING[round(result)]\n",
    "\n",
    "        return ModelMonitoringApplicationResult(\n",
    "            name=\"llm_monitoring_df\",\n",
    "            value=result,\n",
    "            kind=mlrun.common.schemas.model_monitoring.constants.ResultKindApp.model_performance,\n",
    "            status=status,\n",
    "            extra_data={},\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20d1ac66",
   "metadata": {},
   "source": [
    "Define the model monitoring custom function that scans the traffic and calculates the performance metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a912e3b0-0818-451f-8c52-7aa6a2a101d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "application = project.set_model_monitoring_function(\n",
    "    func=\"monit-code.py\",\n",
    "    application_class=\"LLMMonitoringFunction\",\n",
    "    name=\"llm-monit\",\n",
    "    image=\"mlrun/mlrun\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d73fc5b4-dae1-46ae-9bc8-8dc496da6c52",
   "metadata": {},
   "outputs": [],
   "source": [
    "application.spec.readiness_timeout = 1200"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c627200-07a2-4846-a67c-264bfd3670c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "project.deploy_function(application)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ed1506f",
   "metadata": {},
   "source": [
    "Create a model serving class that loads the LLM and generates responses"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ed8f7f4-c64b-4065-96ef-697f0c346675",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile model-serving.py\n",
    "import mlrun\n",
    "from mlrun.serving.v2_serving import V2ModelServer\n",
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "from typing import Any\n",
    "\n",
    "class LLMModelServer(V2ModelServer):\n",
    "\n",
    "    def __init__(\n",
    "        self,\n",
    "        context: mlrun.MLClientCtx = None,\n",
    "        name: str = None,\n",
    "        model_path: str = None,\n",
    "        model_name: str = None,\n",
    "        **kwargs\n",
    "    ):\n",
    "        super().__init__(name=name, context=context, model_path=model_path, **kwargs)\n",
    "        self.model_name = model_name\n",
    "    \n",
    "    def load(\n",
    "        self,\n",
    "    ):\n",
    "        # Load the model from Hugging Face\n",
    "        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)\n",
    "        self.model = AutoModelForCausalLM.from_pretrained(self.model_name)\n",
    "\n",
    "\n",
    "    def predict(self, request: dict[str, Any]):\n",
    "        inputs = request.get(\"inputs\", [])\n",
    "      \n",
    "        input_ids, attention_mask = self.tokenizer(\n",
    "            inputs[0], return_tensors=\"pt\"\n",
    "        ).values()\n",
    "\n",
    "        outputs = self.model.generate(input_ids=input_ids, attention_mask=attention_mask)\n",
    "\n",
    "        # Remove input:\n",
    "        outputs = self.tokenizer.decode(outputs[0])\n",
    "        outputs = outputs.split(inputs[0])[-1].replace(self.tokenizer.eos_token, \"\")\n",
    "        return [{\"generated_text\": outputs}]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b89a05cf-03f9-45a5-af69-5265bd386b24",
   "metadata": {},
   "source": [
    "Build an image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02b8f199-eeb1-4ce5-8808-2451845cfba7",
   "metadata": {},
   "outputs": [],
   "source": [
    "commands = [\n",
    "    \"pip install pytorch-lightning\",\n",
    "    \"pip install packaging==21.3\",\n",
    "    \"pip install transformers adapters openai\",\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb4c1c93-484d-4d3d-9659-a6995b4ebcb2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# need different version of protobuf to work with different python version, python 3.9 -> protobuf 3.20.2, python 3.11 -> protobuf latest\n",
    "import sys\n",
    "\n",
    "minor_version = float(sys.version_info[1])\n",
    "if minor_version >= 11:\n",
    "    commands.append(\"pip install --upgrade --force-reinstall protobuf\")\n",
    "elif minor_version == 9:\n",
    "    commands.append(\"pip install protobuf==3.20.2\")\n",
    "else:\n",
    "    print(f\"minor_version {minor_version} not supported\")\n",
    "commands"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bde7efcf-0661-45b4-998b-db47f2df69ba",
   "metadata": {},
   "source": [
    "Run the following command to build the image, once it's successfully built, no need to run the cell again since it takes quite sometime to build an image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69f67e8f-4d15-4a2c-8331-e0b25f900818",
   "metadata": {},
   "outputs": [],
   "source": [
    "project.build_image(\n",
    "    image=\".llm-serving-base\",\n",
    "    base_image=\"mlrun/mlrun\",\n",
    "    set_as_default=False,\n",
    "    commands=commands,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71ad3ab8",
   "metadata": {},
   "source": [
    "Create the serving function using the class you just defined"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a94ab4eb-524f-4709-9354-53d8cb67cb98",
   "metadata": {},
   "outputs": [],
   "source": [
    "serving_fn = project.set_function(\n",
    "    func=\"model-serving.py\",\n",
    "    name=\"llm-server\",\n",
    "    kind=\"serving\",\n",
    "    image=\".llm-serving-base\",\n",
    ")\n",
    "\n",
    "# Set readiness timeout to 20 minutes, deploy might take a while.\n",
    "serving_fn.spec.readiness_timeout = 1200\n",
    "\n",
    "# Attach fuse mount to the function if not in ce mode\n",
    "if not mlrun.mlconf.is_ce_mode():\n",
    "    serving_fn.apply(mlrun.auto_mount())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8bcec42",
   "metadata": {},
   "source": [
    "## Deploy the model, enable tracking, and deploy the function\n",
    "\n",
    "This tutorial uses the gpt2 model by Google. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f048d80",
   "metadata": {},
   "source": [
    "Log the model to the project"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a983b01-3be4-4195-852a-a950fee0d5c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "base_model = \"gpt2\"\n",
    "project.log_model(\n",
    "    base_model,\n",
    "    model_file=\"src/model-iris.pkl\",\n",
    "    inputs=[Feature(value_type=\"str\", name=\"question\")],\n",
    "    outputs=[Feature(value_type=\"str\", name=\"answer\")],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f713b7d9",
   "metadata": {},
   "source": [
    "Adding the model parameters to the endpoint. This allow the model server class to initialize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ea3139a",
   "metadata": {},
   "outputs": [],
   "source": [
    "serving_fn.add_model(\n",
    "    \"gpt2\",\n",
    "    class_name=\"LLMModelServer\",\n",
    "    model_path=f\"store://models/{project.name}/gpt2:latest\",\n",
    "    model_name=\"gpt2\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00bc60ae",
   "metadata": {},
   "source": [
    "Enable tracking for the function, then deploy it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58575f9f-4d41-4408-95af-9db52dd3bd29",
   "metadata": {},
   "outputs": [],
   "source": [
    "serving_fn.set_tracking()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95259455-0043-4576-b361-a91cda808df4",
   "metadata": {},
   "outputs": [],
   "source": [
    "deployment = serving_fn.deploy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05307d65-4371-4f67-a12c-4a7a7ca2dbd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "ret = serving_fn.invoke(\n",
    "    path=f\"/v2/models/{base_model}/infer\",\n",
    "    body={\"inputs\": [\"What is a mortgage?\"]},\n",
    ")\n",
    "ret"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b659ea0e-210c-4f68-893d-f6ad94236161",
   "metadata": {},
   "source": [
    "Test your model serving"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4edc4f51-8783-41b7-a69b-99caeb91cb90",
   "metadata": {},
   "source": [
    "Let's generate traffic against the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd672923-f7ea-46b6-9560-cfb784362edc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "\n",
    "def question_model(questions, serving_function, base_model):\n",
    "    for question in questions:\n",
    "        seconds = 0.5\n",
    "        # Invoking the pretrained model:\n",
    "        ret = serving_fn.invoke(\n",
    "            path=f\"/v2/models/{base_model}/infer\",\n",
    "            body={\"inputs\": [question]},\n",
    "        )\n",
    "        print(ret)\n",
    "        time.sleep(seconds)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f47d779b-2296-4a91-8f2c-7d50fa93e6b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "example_questions = [\n",
    "    \"What is a mortgage?\",\n",
    "    \"How does a credit card work?\",\n",
    "    \"Who painted the Mona Lisa?\",\n",
    "    \"Please plan me a 4-days trip to north Italy\",\n",
    "    \"Write me a song\",\n",
    "    \"How much people are there in the world?\",\n",
    "    \"What is climate change?\",\n",
    "    \"How does the stock market work?\",\n",
    "    \"Who wrote 'To Kill a Mockingbird'?\",\n",
    "    \"Please plan me a 3-day trip to Paris\",\n",
    "    \"Write me a poem about the ocean\",\n",
    "    \"How many continents are there in the world?\",\n",
    "    \"What is artificial intelligence?\",\n",
    "    \"How does a hybrid car work?\",\n",
    "    \"Who invented the telephone?\",\n",
    "    \"Please plan me a week-long trip to New Zealand\",\n",
    "    \"What is inflation?\",\n",
    "    \"How do vaccines work?\",\n",
    "    \"Who discovered gravity?\",\n",
    "    \"Please plan me a weekend trip to Tokyo.\",\n",
    "    \"Write me a short story about a time traveler.\",\n",
    "    \"How many planets are in the solar system?\",\n",
    "    \"What is quantum physics?\",\n",
    "    \"How does a dishwasher work?\",\n",
    "    \"Who wrote '1984'?\",\n",
    "    \"Please plan me a 5-day road trip through California.\",\n",
    "    \"Write me a haiku about autumn.\",\n",
    "    \"What is the tallest mountain in the world?\",\n",
    "    \"How does cryptocurrency work?\",\n",
    "    \"Who invented the light bulb?\",\n",
    "    \"What is the meaning of photosynthesis?\",\n",
    "    \"How does an airplane fly?\",\n",
    "    \"Who painted 'The Starry Night'?\",\n",
    "    \"Please plan me a 10-day trip across South America.\",\n",
    "    \"Write me a letter to apologize to a friend.\",\n",
    "    \"How many countries are there in the world?\",\n",
    "    \"What is renewable energy?\",\n",
    "    \"How does Wi-Fi work?\",\n",
    "    \"Who directed the movie 'Inception'?\",\n",
    "    \"Please plan me a cultural tour of Egypt.\",\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95d67d09-e287-49ee-9800-1a05e46e8043",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "question_model(\n",
    "    questions=example_questions,\n",
    "    serving_function=serving_fn,\n",
    "    base_model=base_model,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df7bff62",
   "metadata": {},
   "source": [
    "Now the traffic to the function is analyzed and the performance is calculated."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "mlrun-base",
   "language": "python",
   "name": "conda-env-mlrun-base-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
